Reconfigurable computing is a computer architecture combining some of the flexibility of software with the high performance of hardware by processing with very flexible high speed computing fabrics like field-programmable gate arrays (FPGAs). The principal difference when compared to using ordinary microprocessors is the ability to make substantial changes to the datapath itself in addition to the control flow. On the other hand, the main difference with custom hardware, i.e. application-specific integrated circuits (ASICs) is the possibility to adapt the hardware during runtime by "loading" a new circuit on the reconfigurable fabric.
Contents |
The concept of reconfigurable computing has existed since the 1960s, when Gerald Estrin's landmark paper proposed the concept of a computer made of a standard processor and an array of "reconfigurable" hardware.[1][2] The main processor would control the behavior of the reconfigurable hardware. The latter would then be tailored to perform a specific task, such as image processing or pattern matching, as quickly as a dedicated piece of hardware. Once the task was done, the hardware could be adjusted to do some other task. This resulted in a hybrid computer structure combining the flexibility of software with the speed of hardware; unfortunately this idea was far ahead of its time in needed electronic technology.
In the 1980s and 1990s there was a renaissance in this area of research with many proposed reconfigurable architectures developed in industry and academia,[3] such as: COPACOBANA, Matrix, Garp,[4] Elixent, PACT XPP, Silicon Hive, Montium, Pleiades, Morphosys, PiCoGA.[5] Such designs were feasible due to the constant progress of silicon technology that let complex designs be implemented on one chip. The world's first commercial reconfigurable computer, the Algotronix CHS2X4, was completed in 1991. It was not a commercial success, but was promising enough that Xilinx (the inventor of the Field-Programmable Gate Array, FPGA) bought the technology and hired the Algotronix staff.[6]
Early Historic Computers: | |
Programming Source | |
---|---|
Resources fixed | none |
Algorithms fixed | none |
von Neumann Computer: | |
Programming Source | |
Resources fixed | none |
Algorithms variable | Software (instruction streams) |
Reconfigurable Computing Systems: | |
Programming Source | |
Resources variable | Configware (configuration) |
Algorithms variable | Flowware (data streams) |
Computer scientist Reiner Hartenstein describes reconfigurable computing in terms of an anti machine that, according to him, represents a fundamental paradigm shift away from the more conventional von Neumann machine. [7] Hartenstein calls it Reconfigurable Computing Paradox, that software-to-configware migration (software-to-FPGA migration) results in reported speed-up factors of up to more than four orders of magnitude, as well as a reduction in electricity consumption by up to almost four orders of magnitude—although the technological parameters of FPGAs are behind the Gordon Moore curve by about four orders of magnitude, and the clock frequency is substantially lower than that of microprocessors. This paradox is due to a paradigm shift, and is also partly explained by the Von Neumann syndrome.
The fundamental model of the reconfigurable computing machine paradigm, the data-stream-based anti machine is well illustrated by the differences to other machine paradigms that were introduced earlier, as shown by Nick Tredennick's following classification scheme of computing paradigms (see "Table 1: Nick Tredennick’s Paradigm Classification Scheme"). [8]
The fundamental model of a Reconfigurable Computing Machine, the data-stream-based anti machine (also called Xputer), is the counterpart of the instruction-stream-based von Neumann machine paradigm. This is illustrated by a simple reconfigurable system (not dynamically reconfigurable), which has no instruction fetch at run time. The reconfiguration (before run time) can be considered as a kind of super instruction fetch. An anti machine does not have a program counter. The anti machine has data counters instead, since it is data-stream-driven. Here the definition of the term data streams is adopted from the systolic array scene, which defines, at which time which data item has to enter or leave which port, here of the reconfigurable system, which may be fine-grained (e. g. using FPGAs) or coarse-grained, or a mixture of both.
The systolic array scene, originally (early 1980s) mainly mathematicians, only defined one half of the anti machine: the data path: the systolic array (also see Super systolic array). But they did not define nor model the data sequencer methodology, considering that this is not their job to take care where the data streams come from or end up. The data sequencing part of the anti machine is modeled as distributed memory, preferably on chip, which consists of auto-sequencing memory (ASM) blocks. Each ASM block has a sequencer including a data counter. An example is the Generic Address Generator (GAG), which is a generalization of the DMA.
Problem: We are given 2 character arrays of length 256: A[] and B[]. We need to compute the array C[] such that C[i]=B[B[B[B[B[B[B[B[A[i]]]]]]]]]. Though this problem is hypothetical, similar problems exist which have some applications.
Consider a software solution (C code) for the above problem:
for(int i=0;i<256;i++) { char a=A[i]; for(int j=0;j<8;j++) a=B[a]; C[i]=a; }
This program will take about 256*10*CPI cycles for the CPU, where CPI is the number of cycles per instruction.
Now, consider the hardware implementation shown here, say on an FPGA. Here, one element from the array 'A' is 'streamed' by a microprocessor into the circuit every cycle. The array 'B' is implemented as a ROM, perhaps in the BRAMs of the FPGA. The wire going into the ROMs labelled 'B' are the address lines and the wires out are the values stored in the ROM at that address. The blue boxes are registers used for storing temporary values Clearly, this is a pipeline and will output 1 value (a useful C[i] value) after the 8th cycle. Hence the output is also a 'stream'.
The hardware implementation takes 256+8 cycles. Hence, we can expect a speedup of about 10*CPI over the software implementation. However, the speedup is much less than this value due to the slow clock of the FPGA.